System and method for predicting and responding to likelihood of volatility

ABSTRACT

The invention relates generally to data analysis, and to systems and methods for the computation of the volatility of a loss. The system comprises a module for receiving a data parameter, a database for storing the parameter, a computerized predictive model, and a business logic processor for executing the predictive model. The volatility is used to, among other things, determine an appropriate work flow for handling the loss.

CROSS-REFERENCE RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser. No. 60/875,750, filed Dec. 19, 2006, the entire contents of which are incorporated herein by reference.

FIELD OF THE INVENTION

The invention relates generally to data analysis, and to a system and method for the computation of the volatility of losses. Background of the Invention

The cost of resolving many insurance claims is predictable with a great deal of certainty. For other claims, the confidence an insurer can have in predicting the total cost of resolving a claim may be much lower. The propensity for a claim to exceed its predicted total resolution cost, including medical costs, is referred to herein as “claim volatility.”

Volatility in Workers' Compensation claims may manifest itself in several ways within the medical portion of the reserve. Medical conditions may worsen substantially. Treatments may result in unexpected complications. Injuries may carry low, but meaningful probabilities of secondary harm. For example, assume an employee suffers a back strain as a result of his employment. The typical conservative course of treatment would not require surgery or significant loss of time from work. In some cases, surgery and substantial loss of work occurs, and in a small portion of those cases, permanent and total disability may result with significant ongoing medical needs. In the low likelihood scenarios of additional surgery and permanent disability, the costs of the claim would be substantially higher.

Identification and triage of claims with high volatility potential would allow for improved alignment of resources towards higher risk claims, decrease the likelihood that injured parties suffer from low likelihood occurrences, and allow employers and insurers to reduce their exposure to the volatility those claims may present. Thus, there is a need in the art for methods and systems for identifying highly volatile claims. In addition, there is a need in the art for methods and systems of addressing these claims.

SUMMARY OF THE INVENTION

The invention relates generally to data analysis and to systems and methods for the computation of a likelihood that the costs incurred in relation to an insurance claim will demonstrate an increased degree of volatility in the future.

One aspect of the invention entails the use of a computer to carry out a predictive computation that estimates the likelihood that an insured loss with a given set of characteristics will exhibit volatility in the future, and thus warrant special attention. For example, a computer may employ a predictive model to estimate the likelihood that an insurance claim will be volatile. The determination of the likelihood that a claim will be volatile is preferably based upon parameters, including, for example and without limitation, the age of the insured, nature of the insurance benefit, policy limitations, medical diagnoses, pharmacy costs, the need for psychiatric treatment, expected time to return to work, an employee's capacity after returning to work, whether there is a need for physical therapy or surgery, and the particular type of damage, disability or injury. This data may be stored in a data warehouse and accessed by the computer assigned to carry out the predictive computation. The predictive computation may be based on a linear regression model, a neural network, a decision tree model, a collection of decision tree models, other statistical methods, or combinations thereof. The predictive computation may be executed at any point during the processing of a claim, however, in one embodiment, the computation is carried out after a period of time (e.g. 30, 60 or 90 days) after receiving the notice of a particular loss.

The predictive computation may be applied to new claims. It may also be applied to re-evaluate open claims on an insurance company's backlog. It may also be applied at multiple stages during the life of the processing of a claim as more data becomes available. Periodic recomputation may identify volatile claims that were not initially identified as such based upon the data available at earlier points in time. Recomputation may also be warranted where the circumstances of a claim change unexpectedly. Periodic recomputation may also result in the re-classification of a previously volatile claim as stable based on new and changed data.

According to another aspect, the invention relates to a method of administering insurance claims based on the results of the predictive computation to more efficiently process claims. The insurance company may, for example, adjust the level of oversight with respect to the processing of claims in order to obtain the best outcome. Processing the claim may also include managing the claim and assigning resources to the claim to have an increased impact on a claimant's outcome. For example, based on each claim's predicted likelihood of being a volatile claim, the insurer can assign claims to claims handlers with a skill set and level of experience commensurate with the claim, provide an appropriate level of medical review and treatment, and/or provide an appropriate level of vocational counseling. Medical review and treatment may include, without limitation, review and/or treatment from physical therapists, occupational therapists, vocational rehabilitation providers, physicians, nurses, nurse case managers, psychologists, alternative medical practitioners, chiropractors, research specialists, drug addiction treatment specialists, independent medical examiners, and social workers. The selection of the level of review and/or treatment may include a selection of a particular provider having the skills, experience, and domain knowledge applicable to the claim, an aggressiveness of treatment or review, and/or frequency of treatment or review. In one embodiment, the method includes recommending a specific course of treatment based on a detected change in the volatility of a claim. Recommended treatments are based on the specific condition and medical history of the injured party as well as, at least in part, historical treatment statistics, expert advice, and/or on appropriate clinical trials. Such recommendations may be provided, for example, to claims adjusters and/or to medical or case management personnel to review and consider based on the specific condition and injured party involved in the claim.

In addition, the insurance company may employ the results of the predictive computation to determine if the level of non-compensatory expenses are deemed appropriate for a given claim. The insurance company may also use the results to determine a level of non-medical investigation or analysis to apply to the claim. For example, the results may be used to determine if a private investigator or other vendor or expert should be engaged to further investigate the circumstances surrounding a claim. Such investigation might be used to detect fraud, inform subrogation determinations, and recommend treatments. The results also may be used to assign additional resources, including but not limited to actuaries, statisticians, or other research analysts to review the claim.

The insurance company, in various embodiments, may also make information pertaining to the volatility likelihoods of groups of claims available for the use of employees who are responsible for setting the insurance company's reserves. Any of the uses described above may be applied to all claims or limited to claims that meet a specified likelihood level (e.g. a 90% likelihood of a claim being volatile, or a 75% likelihood of a claim being volatile).

BRIEF DESCRIPTION OF THE FIGURES

The foregoing discussion will be understood more readily from the following detailed description of the invention with reference to the following figures.

FIG. 1 is a diagram illustrating a system for claim administration based upon a claim's predicted likelihood volatility, according to one embodiment of the invention.

FIG. 2 is a flow chart of a method for generating a computerized model for predicting claim volatility according to an illustrative embodiment of the invention.

FIG. 3 is a plot of medical expenditures associated with a volatile claim, according to an illustrative embodiment of the invention.

FIG. 4 is a flow chart of a method for deriving a target variable for classifying prior claims with respect to their volatility, according to an illustrative embodiment of their invention.

FIG. 5 is a flow chart of a method of training a volatility predictive model, according to an illustrative embodiment of their invention.

FIG. 6 is a flow chart of a method of applying a trained volatility predictive model, according to an illustrative embodiment of their invention.

FIG. 7 is a plot of performance data derived from validating a volatility predictive model generated and trained according to an illustrative embodiment of their invention.

FIG. 8 is a block diagram of a computing architecture suitable for implementing various ones of the computing devices depicted in FIG. 1.

FIG. 9 is a flowchart of a method of claim administration based upon a claim's predicted likelihood of exhibiting volatility, according to one embodiment of the invention.

FIG. 10 is a diagram illustrating a volatility-based claims reserving strategy according to an illustrative embodiment of the invention.

FIG. 11 is a flowchart of a method for underwriting insurance coverage based on a likelihood of a customer experiencing volatile claims.

DESCRIPTIONS OF CERTAIN ILLUSTRATIVE EMBODIMENTS

To provide an overall understanding of the invention, certain illustrative embodiments will now be described, however, it will be understood by one of ordinary skill in the art that the systems and methods described herein may be adapted and modified as is appropriate for the application being addressed and that the systems and methods described herein may be employed in other suitable applications, and that such other additions and modifications will not depart from the scope hereof.

FIG. 1 is a diagram illustrating a system for claim administration based upon a claim's predicted volatility, according to one embodiment of the invention. The system contains a data warehouse 101, a business logic processor 103, a predictive model 104, a network 105, a client terminal 107, and a workflow processor 111.

The data warehouse 101 is the main electronic depository of an insurance company's current and historical data. The data warehouse 101 includes one or more interrelated databases 109 that store information relevant to insurance data analysis. The interrelated databases 109 store both structured and unstructured data. Structured data includes data stored in defined data fields, for example, in a data table. Unstructured data includes raw information, including, for example, computer readable text documents, document images, audio files, video files, and other forms of raw data.

Databases in the interrelated databases 109 may for example store data in a relational database, in various data fields keyed to various identifiers, such as, without limitation, customer, data source, geography, or business identifier (such as Standard Industry Classification code). The information stored in the data warehouse 101 is obtained through communications with customers, agents, sensors monitoring insured property, vendors, data and text mining software applications, and third party data providers and investigators. In other implementations, use of the data warehouse can be replaced with a more traditional database application without departing from the scope of the invention.

The business logic processor 103 includes one or more computer processors, a memory storing the predictive model 104, and other hardware and software for executing the predictive model 104. More specifically, the software may be computer readable instructions, stored on a computer readable media, such as a magnetic, optical, magneto-optical, holographic, integrated circuit, or other form of non-volatile memory. The instructions may be coded, for example, using C, C++, JAVA, SAS or other programming or scripting language. To be executed, the respective computer readable instructions are loaded into Random Access Memory associated with the business logic processor 103.

The predictive model 104 is used by the business logic processor 103 to estimate the likelihood that a claim will exhibit increased volatility in comparison to other claims. Volatility may be measured in various ways. In one implementation, the volatility estimated is the volatility of medical payments incurred in relation to a claim. A claim is considered to be volatile if it has the potential to result in significant non-linear increases in medical expenditures. In other implementations, volatility may be measured in other ways. For example, in various alternative embodiments, instead of focusing only on medical costs, volatility can viewed as the potential for significant non-linear increases in other individual cost components, or combinations thereof, associated with a claim, including but not limited to medical review and management, third party investigation expenses, and claim oversight costs. In alternative embodiments, the business logic processor may evaluate the likelihood that costs associated with a claim will be stable, rather than volatile.

The predictive model 104 may be a linear regression model, a neural network, a decision tree model, or a collection of decision trees, for example, and combinations thereof. The predictive model 104 may be stored in the memory of the business logic processor 103, or may be stored in the memory of another computer connected to the network 105 and accessed by the business logic processor 103 via the network 105.

The predictive model 104 preferably takes into account a large number of parameters, such as, for example, some or all of the parameters listed in Table 1, below. The evaluation period referred to in the table may be, for example, and without limitation, the first 45, 90, or 120 days after a first notice of loss is received by the insurance company. Alternatively, the evaluation periods may be regular periods, for example, months, quarters, or years.

TABLE 1 Illustrative Variables for Predictive Models Medical invoice totals for the following (during evaluation periods) Pharmacy Doctors office Inpatient Hospital Outpatient Hospital Emergency Room Ambulatory Surgical Center Nursing Facility Ambulance Inpatient Psychiatric Facility Community Mental Health Center Count of visits of the following type (during evaluation periods) Emergency Critical care Diagnostic Physical therapy Surgery Anesthesia Radiology Whether Primary injury is one of the following types Nervous Back sprain Fracture Dislocation Open wounds Musculoskeletal Compensation coverage code (varies by state) Network penetration (In network verses out of network medical spend) Estimated incurred (reserved amount) at end of evaluation period Estimated total medical spend Accident state Claimant age Nature of benefit code Business unit and business group Estimated indemnity payment Prior claimant injuries and/or medical conditions

Preferably, the predictive model 104 is trained on a collection of data known about prior insurance claims and their ultimate disposition, including, for example, and without limitation, the types of costs described above. In various embodiments, the particular data parameters selected for analysis in the training process are determined by using regression analysis or other statistical techniques, such as posterior probability modeling, known in the art for identifying relevant variables in multivariable systems. The results of such analysis can also be used to inform claims adjusters as to the importance of various types of data to encourage inclusion of more data related to the more relevant parameters in free text fields or structured data fields of claims reporting forms. One particular predictive model suitable for use as the predictive model 104, and methods for generating such a predictive model are described further in relation to FIGS. 2-7.

FIG. 2 is a flowchart of a method of generating and validating a predictive model for predicting claim volatility according to an illustrative embodiment of the invention. In step 202, a quantitative “target variable” is determined that may be used to categorize a collection of observed claim data into those that exhibit volatile behavior and those that do not. For example, a target variable may be the result of a function, which can then be compared against a threshold value. Claims that have a target variable value that exceeds the threshold value may be considered stable or volatile, depending on how function and threshold are defined. An illustrative process for determining a volatility target variable is described in relation to FIG. 4. The actual predictive model is created in step 203 from a collection of observed past claim data and the target variable. In a preferred embodiment, the predictive model has the form of one or more decision trees. The decision tree(s) may be used to predict the volatility of future claim data. An illustrative process for creating the decision tree(s) is described in relation to FIG. 5. An illustrative method for using the decision tree(s) to generate a volatility likelihood prediction is described in relation to FIG. 6. Finally, the predictive model is validated in step 204. Illustrative results of the validation of an exemplary model generation process are shown in FIG. 7.

In the context of worker's compensation insurance, volatility is a measure that captures unexpected increases in claim transactions. Claim transactions may be, for example, monthly medical expenditures associated with a claim. Medical expenditures often constitute the component of claim costs that is most likely to exhibit volatility in a worker's compensation claim. Although the model generation process is described below based on monthly medical expenditure data, it is understood that other types of claim transactional data may also be used. Volatile behavior includes, but is not limited to, non-linear growth, including, for example, sudden spikes, exponential, growth, and/or stepwise increases, in the monthly medical expenditures associated with a claim.

FIG. 3 shows a graph 300 of the monthly medical expenditure of an illustrative claim that exhibits volatile behavior. Graph 300 shows the amount of medical expenditure per month of the claim over a ten year period. Spikes 301 and 302 indicate unexpected increases in monthly medical expenditure. These increases in spending may, for example, correspond to a costly surgical procedure, unexpected complications arising from routine treatment, for example, infection, or another medical cost that may be out of the norm. The overall exponentially increasing trend exhibited by graph 300 may, for example, reflect an injured party's progressively worsening medical condition.

FIG. 4, is a flowchart of a method of determining the target variable for volatility, as mentioned above in connection to step 202 of FIG. 2. As mentioned above, the target variable is a quantitative function describing a level of volatility. In one embodiment of the invention, it may be used to categorize the past behavior of a claim as volatile or nonvolatile in a binary fashion. In other embodiments, it may assign to the past behavior of a claim a degree of volatility that is more finely defined, such as high, medium, or low volatility. In such embodiments, the target variable value may be compared against multiple thresholds to properly classify an associated claim.

Defining the target variable begins, at step 401, by collecting monthly medical expenditure data for each of a group of claims. In one embodiment, the monthly medical expenditure data for each claim may include expenditure data for a time period ranging from one month to 15 years or more. The claims may be currently pending claims and/or historical claims that have finally resolved and been closed. The monthly medical expenditure of each claim may take the form of a time series, such as the time series plotted in graph 300 of FIG. 3.

At step 402, the expenditure data for each claim or for groups of claims is analyzed to produce values for volatility-indicative variables, i.e., statistical characteristics deemed to be evidence of volatility. Illustrative volatility-indicative variables include, in one implementation, the mean (μ) monthly medical expenditure for a claim, the semi-quartile range (SQ) of monthly medical expenditures for the claim, and correlation coefficient (ρ) of the monthly medical expenditure of each claim over a particular time interval (indicating the degree to which medical expenditures in a given month correlate to the expenditures in one or more preceding months). At step 403, fuzzy membership functions may be applied to the relationship between each of the statistical parameters and “the degree of volatility” as seen in the historic data. Fuzzy membership functions are a subset of fuzzy logic operations. Such functions are described in Fuzzy Sets, by Zadeh, L. A., Vol. 8, 1965, 338-353, which is incorporated herein by reference. For example, the mean, semi-quartile range, and correlation coefficient of the monthly medical expenditure of a claim may be used to form fuzzy variables μ_(fuzzy), SQ_(fuzzy), and ρ_(fuzzy). At step 404, the fuzzy variables are aggregated to determine a single variable that is representative of degree of volatility. One examples of a suitable aggregation function is the minimum of the individual fuzzy variable values. At step 405, the target variable value is decided based on threshold values for the aggregated fuzzy variable. For example, in one implementation, claims with an aggregate fuzzy score of more than 0.135 are considered volatile and the remaining claims are considered non volatile. The target variable for a claim was assigned a value of 1 for volatile claims and 0 for non-volatile claims.

The specific target variable (including fuzzy membership functions, an aggregation operation, and a corresponding threshold) used in generating the predictive model in step 203 is chosen by applying multiple target variables to several training data sets. The target variable is selecting according to the minimax criterion. That is, the selected target variable is that which minimizes the maximum error in application of the target variable to the training data sets. Each set of training data consists of medical expenditure data for multiple claims. The claims in the training data sets are manually prescreened for volatility by a human expert. An error occurs when the volatility category of the claim as determined by the target variable is different from that determined by the human expert. In some implementations, the human expert identifies each claim as either volatile or stable. In such implementations, all misidentifications made by the applying a potential target are weighted equally. In other implementations, the human expert also assigns each claim a degree of volatility. In such implementations, the magnitude of an error associated with a misidentification is based on the degree of volatility assigned to the claim by the human expert. For example, misidentifying a highly volatile claim as stable would yield a larger error than the misidentification of a slightly volatile claim. Once the fuzzy membership functions and aggregation operations have been determined, a sample of claim trends with known volatility may be used to validate the effectiveness of the target variable.

The process described by flowchart 400 retrospectively describes the volatility of a set of observed past claims. In contrast, flowchart 500 of FIG. 5 details the process of creating a predictive model that may be used to predict future volatility in a claim. In one embodiment of the invention, the predictive model generated according to the method depicted in FIG. 5 includes a collection of decision trees. The parameters used in the decision tree models may include some or all of the parameters listed in table 1, above.

According to the flowchart 500, the process of building a predictive model begins with collecting a set of training data, at step 501. The training data set consists of data for previously handled claims. At step 502, the volatility of each claim in this data set may be determined by computing the claim's target variable as described in relation to FIG. 4. The claims in the training data set are categorized into a volatile and nonvolatile group. Preferably, a large number of claims, for example, more than 10,000 or more than 100,000 are included in the training data set.

At step 503, a number, N, of stratified samples is generated from the training data set. The samples are stratified in that each sample has the same proportion of volatile to nonvolatile claims. In one embodiment, the volatile group of claims may be oversampled to create samples in which the percentage of claims that are volatile in each sample exceeds the percentage of volatile claims in the full training data set. For example, experimental data suggests that on the order of 1% of claims demonstrate high levels of volatility. However, to adequately train the collection of decision trees referred to above, it may be beneficial to ensure 10% of the claims in each sample are volatile. Thus, each volatile claim may be included in multiple samples.

For each of the N samples generated in step 503, a decision tree is generated. In one implementation, each leaf node in each decision tree has an associated likelihood value, for example, a posterior probability, that a claim will exhibit volatility in the future. The output of processing a claim with the decision tree is the likelihood value associated with the leaf node into which the claim is categorized. The N decision trees may be generated independently from each other and may be formed using one or more algorithms known in the art. There are a number of such methods for generating decision trees from a set of training data, including Gini index and entropy reduction based methods. These methods are described in, for example, Barry De Ville (2006), Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner, SAS Press Series. Each of the N decision trees may be generated using a different or the same method. The particular tree generation method used for each sample may be selected or optimized based on subsequent model validation results. For example, each of the trees may be generated according to two or more models. Then, various combinations of the trees may be compared to see which yields the best results upon validation. The purpose of generating multiple trees from multiple data samples is to combat the tendency of decision tree models to change substantially in response to small changes in the training data. The predictive model also includes an aggregation function that can be applied to the outputs of the decision trees in the model to yield a final volatility likelihood score. In various implementations, the aggregation function serves to aggregate posterior probabilities expressed by individual decision trees. For example, suitable aggregation functions include, for example, voting methods that are proportion-based or average-based. In one particular implementation, the aggregation function is a simple averaging of posterior probabilities from the N decision trees.

FIG. 6 is a flowchart of a method of using the predictive model generated in FIG. 5 to obtain a future volatility prediction on a particular test claim. At step 602, values for the decision tree parameters for the test claim are input into the decision trees. The decision trees are processed to produce likelihoods that the test claim will be volatile (as defined by the target variable). At step 603, the likelihoods from the decision trees are aggregated to produce an overall likelihood score, as described above. At step 604, the aggregated likelihood score is output by the predictive model for further processing.

In one example, a predictive model was generated and validated in accordance with the methodology described above. The model was generated using a sample data set including several hundred thousand historical claims. Approximately 1% of these claims were identified as being volatile based on the classification methodology described above in relation to FIG. 4.

Ten stratified samples were obtained from the training data set using a bootstrap sampling methodology with replacement. The group of volatile claims was oversampled so that the ratio of volatile to nonvolatile claims in each stratified sample was one to ten. Ten decision trees were generated using these ten stratified samples. Seventy percent of the claims in each sample set were used to train a respective predictive model. The remaining thirty percent of the claims were used to validate that model. To generate each tree, one tree was selected from multiple candidate trees generated using various combinations of parameters and decision tree methodologies. In this specific example, six of the trees were selected to be Gini index based trees and the remaining trees were entropy reduction based trees. The volatility likelihood values produced by the trees were aggregated together by computing their mean.

Next, an eleventh sample of about ten percent of the total sample claims, also selected using replacement, was obtained. 70% of these claims were used to train the aggregate model. The remaining 30% were used to conduct a final validation of the model. The validation results are depicted in FIG. 7. In FIG. 7, curve 701 shows the cumulative percent captured response when the model is used to predict the likelihood of future volatility using the training claims in the eleventh sample data set. Curve 702 results from using the model to predict future volatility of the validation data set. Since the validation data set is independent of the data used to generate the model, curve 702 is a representation of the actual effectiveness of the model. As seen in both curves 701 and 702 on FIG. 7, about 75% of the known volatile claims processed by the model received outputs in the top deciles of claim outputs.

The model generation and application processes described above are merely illustrative methods for generating and applying a model for use in the process described herein. Other model generation and respective application processes as well as other types of models may be employed without departing from the scope of the invention. For example, in alternative implementations, the predictive model 104 can be based on expert systems or other systems known in the art for addressing problems with large numbers of variables. The model may be generated by the business logic processor 103, another computing device operated by the insurance company, or by a computing device operated by a third party having access to the insurance company's prior claims data.

The predictive model 104 may be updated from time to time as an insurance company receives additional claim data to use as a baseline for building the predictive model 104. The updating includes retraining the model based on the updated data using the previously selected parameters. Alternatively, or in addition, updating includes carrying out the model generation process again based on the new data.

Referring back to FIG. 1, the network 105 enables the transfer of claim data between the data warehouse 101, the business logic processor 103, the client computer 107, the business workflow processor 111, and third party suppliers or vendors of data. The network includes a local area network as well as a connection to the Internet.

The client terminal 107 includes a computer that has a CPU, display, memory and input devices such as a keyboard and mouse. The client terminal 107 also includes a display and/or a printer for outputting the results of the analysis carried out by the predictive model 104. The client terminal 107 also includes an input module where a new claim may be filed, and where information pertaining to the claim may be entered, such as a notice of loss, for example. In addition to being implemented on the client terminal 107, or in the alternative, the input module may be implemented on other insurance company computing resources on the network 105. For example, the input module may be implemented on a server on the network 105 for receiving claims over the Internet from one or more websites or client applications accessed by insurance company customers, company agents, or third party preprocessors or administrators. The input module is preferably implemented as computer readable and executable instructions stored on a computer readable media for execution by a general pr special purpose processor. The input module may also include associated hardware and/or software components to carry out its function. For example, for implementations of the input module in which claims are entered manually based on the notice of loss being received telephonically, the input module preferably includes a voice recording system for recording, transcribing, and extracting structural data from such notices.

The workflow processor 111 includes one or more computer processors and memory storing data pertaining to claim handlers, supervisors, medical reviewers, medical providers, medical provider supervisors, private investigators, and other vendors. Stored information may include, without limitation, experience, skill level, reputation, domain knowledge, and availability. The workflow processor 111 also includes other hardware and software used to assign a claim to at least one of a claim handler, supervisor, medical reviewer, medical provider, medical provider supervisor, legal services provider, and independent investigator by the business logic processor 103. For example, in one implementation, the workflow processor 111 assigns more aggressive medical care and review to claims having higher likelihoods of being volatile claims, thereby applying resources to those that might benefit the most. The level of medical care and/or review management may be tiered. Medical care and review may include, without limitation, review and/or treatment from physical therapists, occupational therapists, vocational rehabilitation providers, physicians, nurses, nurse case managers, psychologists, alternative medical practitioners, chiropractors, research specialists, drug addiction treatment specialists, independent medical examiners, and social workers. The selection of the level of review and/or care may include a selection of a particular provider having the skills, experience, and domain knowledge applicable to the claim, an aggressiveness of treatment or review, and/or frequency of treatment or review. The workflow processor 111 or the business logic processor 103 may also have software configured to determine a general expense tolerance for a claim, i.e., a tolerance for expending resources on costs not associated with compensating a claimant or covered individual.

As an alternative to the illustrated FIG. 1, the physical components of the data warehouse 101, client computer 107, business logic processor 103, predictive model 104 and workflow processor 111 may be housed within the same computing device. As another alternative, the functionality of the business logic processor 103 and workflow processor 111 may be implemented on a single computing device.

FIG. 8 is a block diagram of a computing architecture suitable for implementing various ones of the computing devices depicted in FIG. 1, including, for example, the business logic processor 103, the client computer 107, and the workflow processor 111.

Computer 801 comprises at least one central processing unit (CPU) 802, at least one read-only memory (ROM) 803, at least one communication port or hub 804, at least one random access memory (RAM) 805, and one or more databases or data storage devices 806. All of these later elements are in communication with the CPU 802 to facilitate the operation of the computer 801. The computer 801 may be configured in many different ways. For example, computer 801 may be a conventional standalone computer or alternatively, the function of computer 801 may be distributed across multiple computing systems and architectures.

Computer 801 may be configured in a distributed architecture, wherein databases and processors are housed in separate units or locations. Some such units perform primary processing functions and contain at a minimum, a general controller or a processor 802, a ROM 803, and a RAM 805. In such an embodiment, each of these units is attached to a communications hub or port 804 that serves as a primary communication link with other servers 807, client or user computers 808 and other related devices 809. The communications hub or port 804 may have minimal processing capability itself, serving primarily as a communications router. A variety of communications protocols may be part of the system, including but not limited to: Ethernet, SAP, SAS™, ATP, BLUETOOTH™, GSM and TCP/IP.

The CPU 802 comprises a processor, such as one or more conventional microprocessors and one or more supplementary co-processors such as math co-processors. The CPU 802 is in communication with the communication port 804 through which the CPU 802 communicates with other devices such as other servers 807, user terminals 808, or devices 809. The communication port 804 may include multiple communication channels for simultaneous communication with, for example, other processors, servers or client terminals. Devices in communication with each other need not be continually transmitting to each other. On the contrary, such devices need only transmit to each other as necessary, may actually refrain from exchanging data most of the time, and may require several steps to be performed to establish a communication link between the devices.

The CPU 802 is also in communication with the data storage device 806. The data storage device 806 may comprise an appropriate combination of magnetic, optical and/or semiconductor memory, and may include, for example, RAM, ROM, flash drive, an optical disc such as a compact disc and/or a hard disk or drive. The CPU 802 and the data storage device 806 each may be, for example, located entirely within a single computer or other computing device; or connected to each other by a communication medium, such as a USB port, serial port cable, a coaxial cable, a Ethernet type cable, a telephone line, a radio frequency transceiver or other similar wireless or wired medium or combination of the foregoing. For example, the CPU 802 may be connected to the data storage device 806 via the communication port 804.

The data storage device 806 may store, for example, (i) a program (e.g., computer program code and/or a computer program product) adapted to direct the CPU 802 in accordance with the present invention, and particularly in accordance with the processes described in detail hereinafter with regard to the CPU 802; (ii) databases adapted to store information that may be utilized to store information required by the program. Suitable databases include data warehouse 101 of FIG. 1.

The program may be stored, for example, in a compressed, an uncompiled and/or an encrypted format, and may include computer program code. The instructions of the program may be read into a main memory of the processor from a computer-readable medium other than the data storage device 806, such as from a ROM 803 or from a RAM 805. While execution of sequences of instructions in the program causes the processor 802 to perform the process steps described herein, hard-wired circuitry may be used in place of, or in combination with, software instructions for implementation of the processes of the present invention. Thus, embodiments of the present invention are not limited to any specific combination of hardware and software.

Suitable computer program code may be provided for performing numerous functions such as determining a volatility score and adjusting a workflow based on a determined volatility score. The program also may include program elements such as an operating system, a database management system and “device drivers” that allow the processor to interface with computer peripheral devices 809 (e.g., a video display, a keyboard, a computer mouse, etc.).

The term “computer-readable medium” as used herein refers to any medium that provides or participates in providing instructions to the processor of the computing device (or any other processor of a device described herein) for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media include, for example, optical, magnetic, or opto-magnetic disks, such as memory. Volatile media include dynamic random access memory (DRAM), which typically constitutes the main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the processor 802 (or any other processor of a device described herein) for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer 808. The remote computer 808 can load the instructions into its dynamic memory and send the instructions over an Ethernet connection, cable line, or even telephone line using a modem. A communications device 804 local to a computing device (or, e.g., a server) can receive the data on the respective communications line and place the data on a system bus for the processor. The system bus carries the data to main memory, from which the processor retrieves and executes the instructions. The instructions received by main memory may optionally be stored in memory either before or after execution by the processor. In addition, instructions may be received via a communication port as electrical, electromagnetic or optical signals, which are exemplary forms of wireless communications or data streams that carry various types of information.

As previously discussed with reference to FIG. 1, servers may also interact and/or control one or more user devices 809, such as displays and printers, or remote computers 808 such as, e.g., agent terminals 108 and carrier terminals 122. The terminals may include any one or a combination of a personal computer, a laptop, a personal digital assistant, a mouse, a keyboard, a computer display, a touch screen, LCD, voice recognition software, or other generally represented by input/output devices required to implement the above functionality.

FIG. 9 is flowchart illustrating a method of claim administration based upon a claim's predicted likelihood of exhibiting cost volatility, according to one embodiment of the invention. The method begins at step 901, when an insurance company receives a notice of loss. The notice of loss may be received from a claimant, from a pre-processor, or from a 3rd party administrator, for example. The notice of loss may be received by telephone, mail, e-mail, web page, web server, or through other data communications over the Internet. In addition, a notice of loss may be received directly or indirectly from sensors monitoring an insured property via a wireless or wired network connection.

Next, at step 903, the claim is assigned to a first employee of the company, or agent associated therewith, for the collection of basic data relating to the claim. At step 905, the claim is assigned to a second employee for processing. This step may be manual. For example, the first employee may review the collected data and make a judgment as to which second employee has the most appropriate skill set and experience level for handling the claim. Alternatively, the assignment may be automatic. For example a computer may assign the claim to the second employee based upon a series of computations relating to pre-set criteria.

After a period of time in which additional claim characteristics are collected by the employee assigned to process the claim (e.g., 30, 45, 60, or 90 days after the notice of loss) the business logic processor 103 computes a predictive estimate of the likelihood that the claim will exhibit volatility. The business logic processor 103 outputs a value indicating the likelihood that the claim will be volatile. For example, the likelihood may take the form of a raw score, a probability value in the form of a probability, i.e., a numeric value between zero and one or between zero percent and one hundred percent, a tier or classification value (e.g. high likelihood, medium likelihood, or low likelihood or level 1, level 2, level 3, level 4, or level 5). The likelihood value may also be a relative value comparing the likelihood of the claim being volatile with the likelihood that other claims being processed will be volatile. This relative value may be an absolute ranking of the claim with respect to other pending claims, or it may be a value indicating a tranche of claims, for example, the top 5%, 10%, or 90% of claims, or top 5, top 10, or top 90 claims most likely to be volatile claims. The output likelihood value can then be used for customized processing of the claim. A data file or report may also be generated for each claim or for a group of claims, which may include data parameters associated with the characteristics of the claim or group of claims, as well as their likelihood of being volatile and the ranking with respect to other pending claims. This report may then be forwarded, for example, to the client terminal 107.

Next, at step 909, the workflow processor 111 reassigns the claim to an employee for managing based upon the likelihood value output by the business logic processor 103. Lastly, at step 911, the assigned employee processes the claim according to its likelihood of being volatile. For example, the level of oversight, level of medical care and review (as described above), non-compensatory expense tolerance, and level of factual investigation for the claim may be based in part on the volatility likelihood. Alternatively, the medical care and review (as described above), non-compensatory expense tolerance, and level of factual investigation for the claim may be set automatically by the workflow processor 111 based on the volatility likelihood.

In addition, or in the alternative to reassigning the claim at step 909, the business logic processor 103 or other computing device on the network 105, may utilize the likelihood value to adjust unassigned reserves (either IBNR or surplus reserves) maintained by the insurance company (step 913), as described further below in relation to FIG. 10.

As another additional or alternative step to step 909, the business logic processor 103 outputs a suggested course of treatment for the injured party (step 915). The suggested course of treatment is based, in one implementation, on an expert system that identifies appropriate treatments based on volatility likelihood or changes in volatility likelihood, as well as the specific condition, presentation, and medical history of the injured party. Recommended treatments are based on the specific condition and medical history of the injured party as well as, at least in part, historical treatment statistics, expert advice, and/or on appropriate clinical trials. Such recommendations may be provided, for example, to claims adjusters and/or to case management personnel to review and consider based on the specific condition and injured party involved in the claim.

FIG. 10 illustrates an exemplary optional reserving strategy which an insurance may elect to implement based on volatility information yielded by a computerized predictive model, according to one embodiment of the invention. As described above, insurance companies typically maintain three types of reserves, claim specific reserves, IBNR reserves, and surplus reserves. The latter reserves are not associated with any individual claim, but are instead associated with groups of claims. An IBNR (Incurred But Not Reported) reserve is an aggregate reserve that is not explicitly allocated to specific claims, but rather held as an umbrella over all claims to guard against low probability, high-loss occurrences. These reserves are maintained in case claims in a given group exceeds their expected costs. In the strategy depicted in FIG. 10, claims are divided into five groups, based on their relative likelihood of volatility in comparison to other claims being processed. For example, Group I includes the lowest two deciles of claims ranked based on their likelihood of volatility,. Group II claims include claims in the next two deciles. Group III includes the claims ranked in the third and fourth deciles. Group IV includes claims with volatility likelihoods that are higher than 60% of other claims, but lower than a remaining 20% of claims. Group V includes the 20% of claims having the highest likelihoods of volatility. The depicted division of claims is merely illustrative in nature. Claims may be broken down in to fewer or more groups. In addition, in alternative implementations, certain groups include a larger range of claims and other include fewer claims. For example, in one implementation, each group has incrementally fewer claims than the preceding group. Group I includes the lowest 50% of claims ranked by volatility likelihood, and Group V includes the 5% of claims having the highest likelihood of volatility. In still other implementations, claims are grouped based on their absolute likelihood of volatility.

Each group has a corresponding IBNR and surplus reserve function, by which IBNR and surplus reserves are determined for the respective group. In one implementation, the reserve functions may be the product of the aggregate expected costs for the claims in a group with a reserving factor. The reserving factor increases in relation to the likely volatility of the claims in the group. For example, IBNR reserves for group I claims (those having the lowest likelihood of exhibiting volatility) may be set to 0.25 times the total expected costs of all claims in Group I. In contrast IBNR reserves for group V claims (those with the highest likelihood of exhibiting volatility) may be set to be 2.0 times the total expected costs of all claims in Group V.

In another embodiment of the invention, multiple, or all of a company's insurance claims are subject to the predictive computation. In this embodiment, the predictive computation is executed consistently at a pre-set interval, for example, once a week, to claims that have reached a pre-set age (e.g. 30, 45, 60, or 90 days after notice of loss) during the time interval. These selected claims may then be managed according to their volatility likelihood. Alternatively, claims may be ranked according to their volatility likelihood, with those claims that are most likely (e.g. top 5%, 10% or 25% of claims, or top 5, 10 or 25 claims, etc.) to be volatile being managed in part according to their volatility likelihood. In this alternative, the number of pending claims may be adjusted in relation to the number of employees that are available for claim processing. Volatility likelihood for claims may be occasionally or periodically reprocessed to determine if information obtained since a previous volatility likelihood estimation alters the likelihood that that the claim will be volatile. Such a change and/or the magnitude of such a change relative to a prior volatility score is then analyzed to determine whether the claim merits different management, including but not limited to a different course of medical treatment.

FIG. 11 is a flow chart of a method 1100 for underwriting customers for insurance based on predicted claim volatility. Some businesses are more prone to experiencing losses that result in volatile claims than others. For example, injuries resulting from lifting heavy objects tend to more volatile than many other injuries. Thus, customers whose employees frequently lift heavy objects will be more prone to volatile claims than others. This volatility likelihood can be used to determine whether to extend coverage, renew such coverage, rate such coverage, and price such coverage. The volatility likelihood can also be used to adjust an underwriting workflow process. For example, an insurance company may require, and an insured might prefer, additional loss control, including investigations for customers likely to experience volatile claims.

The method 1100 includes training a computerized model configured for determining the likelihood that claims a customer or insured will incur will be volatile. The method begins with training a computerized predictive model (step 1102). In one implementation, the predictive model is of the type described above in FIGS. 2-5 and is trained on data solely related to an insurance customer, as opposed to, for example, details about specific claims or injured parties. In alternative embodiments, specific claim data is used for training the model in addition to customer specific data.

Next, an insurance company receives an application or quote request for insurance for a customer (step 1104). The application or quote includes data used to determine the likelihood that claims filed by the customer will be volatile. If the application or quote request includes insufficient data (decision block 1106), additional information is requested (step 1108).

Once sufficient data has been received, the data is processed by the predictive model (step 1110). In one implementation, the predictive model outputs a volatility indicator or score. Based on the score, an underwriting decision is output (step 1112). The underwriting decision may include a decision to extend a coverage, to deny a coverage, to recommend a different coverage, to renew a coverage, to rate a coverage, a coverage price decision, and/or an underwriting workflow decision.

The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The forgoing embodiments are therefore to be considered in all respects illustrative, rather than limiting of the invention. 

1. A system for analyzing data comprising: an input module for receiving at least one parameter corresponding to a characteristic of an insurance claim; a database for storing the parameter received by the input module; a computerized predictive model for determining a volatility indicator for the insurance claim based on the parameter; and one or more business logic processors for: executing the predictive models; and processing the insurance claim based upon the volatility indicator determined by the computerized predictive model.
 2. The system of claim 1, wherein processing the insurance claim comprises adjusting a collective reserve for a group of claims having similar volatility indicators based on the volatility of the claims in the group.
 3. The system of claim 1, wherein the computerized predictive model is configured for updating itself after at least one new insurance claim cost has been determined.
 4. The system of claim 1, wherein the one or more business logic processors are configured for recommending a course of treatment for the claim based on its associated volatility indicator.
 5. The system of claim 1, wherein the one or more business logic processors are configured for: detecting a change in a volatility indicator associated with a claim, and recommending a course of treatment for the claim based on the detected change.
 6. The system of claim 1, wherein processing the insurance claim comprises making a workflow determination for the insurance claim based upon the volatility indicator.
 7. The system of claim 6, wherein the workflow determination comprises an assignment of the insurance claim to an employee from a plurality of employees to handle the claim based upon the volatility indicator.
 8. The system of claim 6, wherein the workflow determination comprises a selection of an investigation level for the insurance claim based upon the volatility indicator.
 9. The system of claim 8, wherein the selection of the investigation level comprises determining whether to engage a private investigator to investigate the claim.
 10. The system of claim 8, wherein the selection of the investigation level comprises determining whether to engage an independent medical examiner to investigate the claim.
 11. The system of claim 6, wherein the workflow determination comprises a selection of a level of medical review for the insurance claim based upon the volatility indicator.
 12. The system of claim 6, wherein the workflow determination comprises a selection of a level of medical care for the insurance claim based on the volatility indicator.
 13. The system of claim 1, wherein: the database is configured such that the at least one parameter may be updated; at least one of the one or more processors is configured to re-execute the computerized predictive model in response to the at least one parameter being updated to determine a new volatility indicator; and at least one of the one or more processors is configured to process the insurance claim based upon the new volatility indicator.
 14. The system of claim 1, wherein the computerized predictive model is generated based at least in part on an analysis of medical spending costs for a plurality of claims.
 15. The system of claim 1, wherein the computerized predictive model determines the volatility indicator at least in part on data extracted from unstructured text by a text mining application.
 16. The system of claim 1, wherein the computerized predictive model comprises a collection of decision trees.
 17. The system of claim 1, wherein at least one of the one or more business logic processors is configured for training the computerized predictive model.
 18. The system of claim 1, wherein the least one of the one or more business logic processor is configured to: segment insurance claims based on their respective volatility indicators into groups; and set unassigned reserves for each group as a function of the volatility of insurance claims in the respective segment.
 19. The system of claim 18, wherein the at least one of the one or more business logic processors is configured to set surplus reserves for each segment as a function of the volatility of insurance claims in the respective segment.
 20. A system for analyzing data comprising: a database for storing: a plurality of characteristics for an insurance-seeking company and for a plurality of insured companies, and insurance claims histories for the plurality of insured companies; a computerized predictive model trained on the data stored in the database for the plurality of insured companies to detect companies likely to have volatile claims; and one or more business logic processors for: underwriting the insurance-seeking company based on the plurality of parameters received for the insurance-seeking company and the computerized predictive model, and outputting an underwriting decision based on the underwriting.
 21. The system of claim 20, wherein the underwriting decision comprises a premium price.
 22. The system of claim 20, wherein the underwriting decision comprises a classification.
 23. The system of claim 20, wherein the underwriting decision comprises one of a decision to extend coverage, a decision to deny coverage, a decision to renew coverage, and a decision to adjust an underwriting workflow process.
 24. A method for analyzing data comprising: receiving by an input module at least one parameter corresponding to a characteristic of an insurance claim; storing by a processor the parameter in a database; executing by the processor a computerized predictive model based at least in part on the parameter to determine a volatility indicator for the insurance claim; and processing the insurance claim based upon the volatility indicator determined by the computerized predictive model.
 25. The method of claim 24, wherein processing the insurance claim comprises adjusting a collective reserve for a group of claims having similar volatility indicators based on the volatility of the claims in the group.
 26. The method of claim 24, wherein the computerized predictive model is configured for updating itself after at least one new insurance claim cost has been determined.
 27. The method of claim 24, further comprising recommending a course of treatment for the claim based on its associated volatility indicator.
 28. The method of claim 24, further comprising: detecting a change in a volatility indicator associated with a claim, and recommending a course of treatment for the claim based on the detected change.
 29. The method of claim 24, wherein processing the insurance claim comprises making a workflow determination for the insurance claim based upon the volatility indicator.
 30. The method of claim 29, wherein the workflow determination comprises an assignment of the insurance claim to an employee from a plurality of employees to handle the claim based upon the volatility indicator.
 31. The method of claim 29, wherein the workflow determination comprises a selection of an investigation level for the insurance claim based upon the volatility indicator.
 32. The method of claim 31, wherein the selection of the investigation level comprises determining whether to engage a private investigator to investigate the claim.
 33. The method of claim 31, wherein the selection of the investigation level comprises determining whether to engage an independent medical examiner to investigate the claim.
 34. The method of claim 29, wherein the workflow determination comprises a selection of a level of medical review for the insurance claim based upon the volatility indicator.
 35. The method of claim 29, wherein the workflow determination comprises a selection of a level of medical care for the insurance claim based on the volatility indicator.
 36. The method of claim 24, further comprising: updating the parameter stored in the database; re-executing the computerized predictive model in response to the parameter update to determine a new volatility indicator; and processing the insurance claim based upon the new volatility indicator.
 37. The method of claim 24, wherein the computerized predictive model is generated based at least in part on an analysis of medical spending costs for a plurality of claims.
 38. The method of claim 24, wherein the computerized predictive model determines the volatility indicator at least in part on data extracted from unstructured text by a text mining application.
 39. The method of claim 24, wherein the computerized predictive model comprises a collection of decision trees.
 40. The method of claim 24, wherein the computerized predictive model is generated from training data.
 41. The method of claim 24, further comprising: segmenting insurance claims based on their respective volatility indicators into groups; and setting unassigned reserves for each group as a function of the volatility of insurance claims in the respective segment.
 42. The method of claim 41, further comprising setting surplus reserves for each segment as a function of the volatility of insurance claims in the respective segment.
 43. A method for analyzing data comprising: storing by a processor in a database a plurality of parameters for an insurance-seeking company and for a plurality of insured companies, and insurance claims histories for the plurality of insured companies; training by a processor a computerized predictive model on the data stored in the database for the plurality of insured companies to detect companies likely to have volatile claims; underwriting by a processor the insurance-seeking company based on the plurality of parameters received for the insurance-seeking company and the computerized predictive model; and outputting an underwriting decision based on the underwriting.
 44. The method of claim 43, wherein the underwriting decision comprises a premium price.
 45. The method of claim 43, wherein the underwriting decision comprises a classification.
 46. The method of claim 43, wherein the underwriting decision comprises one of a decision to extend coverage, a decision to deny coverage, a decision to renew coverage, and a decision to adjust an underwriting workflow process.
 47. A computer-readable medium encoded with machine-readable instructions for analyzing data, the machine-readable instructions comprising: receiving at least one parameter corresponding to a characteristic of an insurance claim; storing the parameter in a database; executing a computerized predictive model based at least in part on the parameter to determine a volatility indicator for the insurance claim; and processing the insurance claim based upon the volatility indicator determined by the computerized predictive model.
 48. The computer-readable medium of claim 47, wherein processing the insurance claim comprises making a workflow determination for the insurance claim based upon the volatility indicator.
 49. The computer-readable medium of claim 47, wherein the computerized predictive model comprises a collection of decision trees.
 50. A computer-readable medium encoded with machine-readable instructions for analyzing data, the machine-readable instructions comprising: storing in a database a plurality of parameters for an insurance-seeking company and for a plurality of insured companies, and insurance claims histories for the plurality of insured companies; training a computerized predictive model on the data stored in the database for the plurality of insured companies to detect companies likely to have volatile claims; underwriting the insurance-seeking company based on the plurality of parameters received for the insurance-seeking company and the computerized predictive model; and outputting an underwriting decision based on the underwriting.
 51. The computer-readable medium of claim 50, wherein the underwriting decision comprises one of a decision to extend coverage, a decision to deny coverage, a decision to renew coverage, and a decision to adjust an underwriting workflow process.
 52. The computer-readable medium of claim 50, wherein the computerized predictive model comprises a collection of decision trees. 